On Improving the Performance of Data Partitioning Oriented Parallel Irregular Reductions
نویسندگان
چکیده
Different parallelization techniques for reductions have been proposed elsewhere, that we have classified in this paper into two classes: LPO (Loop Partitioning Oriented techniques) and DPO (Data Partitioning Oriented techniques). We have analyzed both classes in terms of a set of performance properties: data locality, memory overhead, parallelism and workload balancing. In this paper we propose several techniques to increase the exploited parallelism and to introduce load balancing into a DPO method. Regarding parallelism, the solution is based on the partial expansion of the reduction array. For load balance, a first technique is generic, as it can deal with any kind of load unbalancing present in the problem domain. A second technique handles a special case of load unbalancing, appearing when there are a large number of write operations on small regions of the reduction arrays. Efficient implementations of the proposed optimizing solutions for the DWA–LIP DPO method are presented, experimentally tested on static and dynamic kernel codes and compared with other parallel reduction methods.
منابع مشابه
Efficient Data Parallel Implementations of Highly Irregular Problems
This dissertation presents optimization techniques for efficient data parallel formulation/implementation of highly irregular problems, and applies the techniques to O(N) hierarchical N–body methods for large–scale N–body simulations. It demonstrates that highly irregular scientific and engineering problems such as nonadaptive and adaptive O(N) hierarchical N–body methods can be efficiently imp...
متن کاملDesign and Evaluation of a Method for Partitioning and Offloading Web-based Applications in Mobile Systems with Bandwidth Constraints
Computation offloading is known to be among the effective solutions of running heavy applications on smart mobile devices. However, irregular changes of a mobile data rate have direct impacts on code partitioning when offloading is in progress. It is believed that once a rate-adaptive partitioning performed, the replication of such substantial processes due to bandwidth fluctuation can be avoid...
متن کاملImproving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buuers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce LocalWrite, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates th...
متن کاملEvaluating Locality Optimizations For Adaptive Irregular Scientific Codes
Irregular scientific codes experience poor cache performance due to their memory access patterns. Researchers have proposed several data and computation transformations to improve locality in irregular scientific codes. We experimentally compare their performance and present GPART, a new technique based on hierarchical clustering. Quality partitions are constructed quickly by clustering multipl...
متن کاملEfficient compiler and run-time support for parallel irregular reductions
Many scienti®c applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buers, then combined using synchronization. We develop LOCALWRITE, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002